3 research outputs found
Multimodal methods for blind source separation of audio sources
The enhancement of the performance of frequency domain convolutive
blind source separation (FDCBSS) techniques when applied to the
problem of separating audio sources recorded in a room environment
is the focus of this thesis. This challenging application is termed the
cocktail party problem and the ultimate aim would be to build a machine
which matches the ability of a human being to solve this task.
Human beings exploit both their eyes and their ears in solving this task
and hence they adopt a multimodal approach, i.e. they exploit both
audio and video modalities. New multimodal methods for blind source
separation of audio sources are therefore proposed in this work as a
step towards realizing such a machine.
The geometry of the room environment is initially exploited to improve
the separation performance of a FDCBSS algorithm. The positions
of the human speakers are monitored by video cameras and this
information is incorporated within the FDCBSS algorithm in the form
of constraints added to the underlying cross-power spectral density
matrix-based cost function which measures separation performance. [Continues.
A new cascaded spectral subtraction approach for binaural speech dereverberation and its application in source separation
In this work we propose a new binaural spectral subtraction
method for the suppression of late reverberation. The pro-
posed approach is a cascade of three stages. The first two
stages exploit distinct observations to model and suppress the
late reverberation by deriving a gain function. The musical
noise artifacts generated due to the processing at each stage
are compensated by smoothing the spectral magnitudes of the
weighting gains. The third stage linearly combines the gains
obtained from the first two stages and further enhances the
binaural signals. The binaural gains, obtained by indepen-
dently processing the left and right channel signals are com-
bined using a new method. Experiments on real data are per-
formed in two contexts: dereverberation-only and joint dere-
verberation and source separation. Objective results verify
the suitability of the proposed cascaded approach in both the
contexts
A geometrically constrained multimodal approach for convolutive blind source separation
A novel constrained multimodal approach for convolutive blind source separation is presented which incorporates video information related to geometrical position of both the speakers and the microphones, and the directionality of the speakers into the separation algorithm. The separation is performed in the frequency domain and the constraints are incorporated through a penalty function-based formulation. The separation results show a considerable improvement over traditional frequency domain convolutive BSS systems such as that developed by Parra and Spence. Importantly, the inherent permutation problem in the frequency domain BSS is potentially solve